Distilling Closed Models Until They Forget They Were Closed
I have been thinking about model distillation lately. Not the academic kind with proper methodology and peer review. The hobbyist kind where someone spends their own money on API credits, LoRA fine-tunes a small model, and releases it for free because they can.
This is actually pretty cool. People are spending their own money to make AI more accessible. They are essentially paying to extract knowledge from closed systems and sharing it with everyone. It is like open source piracy but for neural networks and somehow more legally ambiguous.
The Real Question
Here is where my brain went down a dangerous path. What if instead of LoRA, you did full SFT training on a small base model? Take something like Qwen3.5 0.8B base variant. Feed it enough examples from a closed source teacher model. Just prompt the teacher, collect the outputs, train the student on those outputs.
With enough examples, would not the student just become the teacher? Not exactly of course. The capacity is different. The architecture might differ. But functionally, for most tasks, would you be able to tell the difference?
If you train a small open model on enough outputs from a closed model, at what point does it stop being distillation and start being replication?
Why This Keeps Me Up At Night
I am not a lawyer. I am a person who trains 100K parameter models for fun and gets excited when they complete a sentence without repeating the word the forty-seven times. But this feels like it sits in a gray area that nobody wants to talk about.
Companies protect their models through API access only. No weights, no architecture details, no training data. But if I can query that API enough times and train my own model to behave the same way, did I just open source something that was never meant to be open?
The legal answer is probably complicated. The technical answer is maybe. The ethical answer depends on who you ask and how much they paid for their API subscription that month.
My Tiny Take
I think distillation as a hobby is great. It pushes the community forward. It gives people access to capabilities they would not have otherwise. It also probably makes some product managers very nervous.
I am not going to try this myself. My GPU budget is approximately zero dollars and my free time is spent debugging why my 1M parameter model thinks all numbers are prime. But I respect the people who do this work. They are essentially doing archival work for AI capabilities.
Also if anyone does manage to fully distill a closed model into something small and open, please let me know. I would love to run it on my laptop that already sounds like it is preparing for takeoff.